Visualizing bivariate long-tailed data
نویسندگان
چکیده
Variables in large data sets in biology or e-commerce often have a head, made up of very frequent values and a long tail of ever rarer values. Models such as the Zipf or Zipf–Mandelbrot provide a good description. The problem we address here is the visualization of two such long-tailed variables, as one might see in a bivariate Zipf context. We introduce a copula plot to display the joint behavior of such variables. The plot uses an empirical ordering of the data; we prove that this ordering is asymptotically accurate in a Zipf–Mandelbrot–Poisson model. We often see an association between entities at the head of one variable with those from the tail of the other. We present two generative models (saturation and bipartite preferential attachment) that show such qualitative behavior and we characterize the power law behavior of the marginal distributions in these models.
منابع مشابه
Asymptotic Probabilities of an Exceedance over Renewal Thresholds with an Application to Risk Theory
Let (Yn,Nn)n≥1 be independent and identically distributed bivariate random variables such that the Nn are positive with finite mean ν and the Yn have a common heavy-tailed distribution F . We consider the process (Zn)n≥1 defined by Zn = Yn − n−1, where n−1 = ∑n−1 k=1 Nk . It is shown that the probability that the maximum M = maxn≥1 Zn exceeds x is approximately ν−1 ∫ ∞ x F (u) du, as x → ∞, whe...
متن کاملOn Bivariate Generalized Exponential-Power Series Class of Distributions
In this paper, we introduce a new class of bivariate distributions by compounding the bivariate generalized exponential and power-series distributions. This new class contains the bivariate generalized exponential-Poisson, bivariate generalized exponential-logarithmic, bivariate generalized exponential-binomial and bivariate generalized exponential-negative binomial distributions as specia...
متن کاملDouble-conditional smoothing of high-frequency volatility surface in a spatial multiplicative component GARCH with random effects
This paper introduces a spatial framework for high-frequency returns and a faster double-conditional smoothing algorithm to carry out bivariate kernel estimation of the volatility surface. A spatial multiplicative component GARCH with random effects is proposed to deal with multiplicative random effects found from the data. It is shown that the probabilistic properties of the stochastic part an...
متن کاملDistribution-free control chart for Bivariate Process
V.B. Ghute Dept. of Statistics, Solapur University, Solapur, (MS), India [email protected]; +91 9881372729 ______________________________________________________________________________________________ Abstract Nonparametric or distribution-free control chart is useful in statistical process control when the underlying process distribution is unknown or is not likely to be normal. In...
متن کاملExtremal Dependence: Internet Traffic Applications
For bivariate heavy tailed data, the extremes may carry distinctive dependence information not seen from moderate values. For example, a large value in one component may help cause a large value in the other. This is the idea behind the notion of extremal dependence. We discuss ways to detect and measure extremal dependence. We apply the techniques discussed to internet data and conclude that f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010